Skip to content

Conversation

@ilstam
Copy link
Contributor

@ilstam ilstam commented Jan 7, 2026

Changes

These patches address the following in the VM shutdown path:

  • Duplicate calls to terminate and join the vCPU threads and duplicate messages in the logs
  • KVM_EXIT_SHUTDOWN and KVM_EXIT_HLT being treated as successful terminations, but the first one is an error condition that signals a triple-fault on x86 and the second one we should never get since we use the in-kernel LAPIC
  • Code comments that explain the shutdown path incorrectly

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.22%. Comparing base (64e6a1a) to head (ad9f5bd).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/lib.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5611   +/-   ##
=======================================
  Coverage   83.22%   83.22%           
=======================================
  Files         277      277           
  Lines       29279    29275    -4     
=======================================
- Hits        24367    24364    -3     
+ Misses       4912     4911    -1     
Flag Coverage Δ
5.10-m5n.metal 83.56% <83.33%> (-0.02%) ⬇️
5.10-m6a.metal 82.90% <83.33%> (+<0.01%) ⬆️
5.10-m6g.metal 80.17% <83.33%> (-0.01%) ⬇️
5.10-m6i.metal 83.55% <83.33%> (-0.02%) ⬇️
5.10-m7a.metal-48xl 82.89% <83.33%> (+<0.01%) ⬆️
5.10-m7g.metal 80.17% <83.33%> (-0.01%) ⬇️
5.10-m7i.metal-24xl 83.54% <83.33%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 83.54% <83.33%> (+<0.01%) ⬆️
5.10-m8g.metal-24xl 80.16% <83.33%> (-0.01%) ⬇️
5.10-m8g.metal-48xl 80.16% <83.33%> (-0.01%) ⬇️
6.1-m5n.metal 83.59% <83.33%> (+<0.01%) ⬆️
6.1-m6a.metal 82.92% <83.33%> (-0.01%) ⬇️
6.1-m6g.metal 80.16% <83.33%> (-0.02%) ⬇️
6.1-m6i.metal 83.59% <83.33%> (-0.01%) ⬇️
6.1-m7a.metal-48xl 82.91% <83.33%> (-0.01%) ⬇️
6.1-m7g.metal 80.16% <83.33%> (-0.01%) ⬇️
6.1-m7i.metal-24xl 83.60% <83.33%> (-0.01%) ⬇️
6.1-m7i.metal-48xl 83.60% <83.33%> (-0.01%) ⬇️
6.1-m8g.metal-24xl 80.16% <83.33%> (-0.01%) ⬇️
6.1-m8g.metal-48xl 80.16% <83.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilstam ilstam force-pushed the vm-shutdown branch 2 times, most recently from 50a24be to 043098e Compare January 7, 2026 18:39
@JackThomson2 JackThomson2 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jan 8, 2026
ShadowCurse
ShadowCurse previously approved these changes Jan 8, 2026
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM and thanks for the dive deep and nice clarification of the various exit codes and potential code paths. Just a comment regarding a comment :). Also: in your second commit you mention:

On x86 the guest asks for a CPU reset via the i8042 controller which Firecracker intercepts and kills the VM

I'm not sure this is entirely correct. IIUC, it's the VMM that raises an interrupt via the i8042 to the guest, then the guest somehow starts the process of resetting which ends up being a KVM exit handled by VMM again.

@ilstam
Copy link
Contributor Author

ilstam commented Jan 9, 2026

On x86 the guest asks for a CPU reset via the i8042 controller which Firecracker intercepts and kills the VM

I'm not sure this is entirely correct. IIUC, it's the VMM that raises an interrupt via the i8042 to the guest, then the guest somehow starts the process of resetting which ends up being a KVM exit handled by VMM again.

On x86 a CPU RESET is requested via an i8042 command. Firecracker intercepts this and writes to an eventfd which starts the termination process:
https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/devices/legacy/i8042.rs#L263

it's the VMM that raises an interrupt via the i8042 to the guest

Are you referring to this FC API? https://github.com/firecracker-microvm/firecracker/blob/main/docs/api_requests/actions.md#intel-and-amd-only-sendctrlaltdel

This simply injects a CtrlAltDel sequence to the guest. The guest is free to interpret this however it wants or ignore it. Depending on the way it's configured, Linux might start a soft reboot and eventually it will request a CPU reset via the i8042 command discussed above. There's no dedicated KVM exit (other than the MMIO exit that happens when the guest accesses the i8042).

@bchalios
Copy link
Contributor

bchalios commented Jan 9, 2026

On x86 a CPU RESET is requested via an i8042 command. Firecracker intercepts this and writes to an eventfd which starts the termination process:
https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/devices/legacy/i8042.rs#L263

So, is it the case that event when we don't send the Ctrl+Alt+Delete, guest uses the i8042 device to reset?

(This is just for my education. The PR looks good).

bchalios
bchalios previously approved these changes Jan 9, 2026
@bchalios bchalios enabled auto-merge (rebase) January 9, 2026 10:56
@ilstam ilstam disabled auto-merge January 9, 2026 11:17
@ilstam
Copy link
Contributor Author

ilstam commented Jan 9, 2026

So, is it the case that event when we don't send the Ctrl+Alt+Delete, guest uses the i8042 device to reset?

Yes. If you type 'reboot' in the guest it will (eventually) ask for a CPU reset via the i8042. If you inject a Ctrl+Alt+Delete sequence it might ignore it or it might interpret it as if you typed 'reboot' inside the guest. If you type 'poweroff' I think it will try to do it via ACPI, but since we don't emulate ACPI power management the guest just hangs but the VM remains alive (presumably we re-enter the vCPUs but the guest runs an infinite loop).

ilstam added 2 commits January 9, 2026 11:50
Commit 79c5f79 ("Shutdown VCPU threads so they can
thread::join()") introduced a Vmm::stop() function that should run in
the main VMM thread when a vCPU thread writes to the vcpus_exit_evt
eventfd. Vmm::stop() sends a termination event to the vCPU threads and
joins them. That commit claims that there was a cyclical dependency
where vCPU objects referenced the Vmm objects and therefore Vmm::stop()
could not be called from Vmm:drop().

Later, commit 1d45fcc ("vmm: fix drop logic for Vmm") introduced a
second call to Vmm::stop(), this time from Vmm::drop().

As a result, when killing the VMM today (by issuing a reboot in the
guest) Vmm::stop() is called twice and the "Vmm is stopping" message
appears twice in the firecracker logs.

Additionally, the documentation in Vmm:stop() is incorrect. Not all
teardown paths call vcpu.exit(). In fact, the most common teardown path
which is the guest asking for a CPU reset via the i8042 controller
writes to the corresponding eventfd directly and vcpu.exit() is never
called.

Today, the vCPU threads do not hold references to the Vmm object and
therefore it's fine to join the vCPU threads in Vmm::drop().

Eliminate the double call to Vmm::stop() (and the duplicate log message)
and move the vCPU thread termination and join logic to Vmm::drop().
Vmm::stop() now simply sets Vmm::shutdown_exit_code which causes the VMM
to break out of its main event loop and start the termination process.
Additionally, remove part of the documentation that was already
incorrect or becomes incorrect after this change.

Signed-off-by: Ilias Stamatis <[email protected]>
This is almost a pure revert of commit 3a9a1ac ("exit with success
code on certain KVM_EXIT events") which added code that treats
KVM_EXIT_SHUTDOWN and KVM_EXIT_HLT as successful VM terminations.

KVM_EXIT_SHUTDOWN is an exit code that KVM uses when an x86 CPU triple
faults.

KVM_EXIT_HLT is the exit code that KVM uses when the guest executes a
HALT x86 instruction and KVM doesn't emulate the irqchip. Since we're
using the in-kernel irqchip we should never see a KVM_EXIT_HLT exit, as
HALT instructions are emulated by KVM and do not cause userspace exits.

Do not return Ok(VcpuEmulation::Stopped) for these exit types since that
ends up propagating an FcExitCode::Ok code to the main thread, even
though these are abnormal terminations (especially the triple-fault
one).

Remove special handling for these x86-specific exit reasons and treat
them as any other unexpected exit reason.

Also, replace an incorrect comment that says that vCPUs exit with
KVM_EXIT_SHUTDOWN or KVM_EXIT_HLT when the guest issues a reboot. On x86
the guest asks for a CPU reset via the i8042 controller which
Firecracker intercepts and kills the VM. On ARM KVM exits to userspace
with the reason KVM_EXIT_SYSTEM_EVENT (which we already handle
correctly).

Since we're in the neighbourhood get rid of a stale (8 year old) TODO
comment questioning whether we should kill the VM when we get an
unexpected KVM exit reason. Terminating the VM as we have been doing is
completely reasonable.

Fixes: 3a9a1ac ("exit with success code on certain KVM_EXIT events")
Signed-off-by: Ilias Stamatis <[email protected]>
@ilstam ilstam dismissed stale reviews from bchalios and ShadowCurse via ad9f5bd January 9, 2026 12:02
@ilstam ilstam requested review from ShadowCurse and bchalios January 9, 2026 12:47
@ilstam ilstam merged commit 841ce52 into firecracker-microvm:main Jan 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants